Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster presentations at ISMB 2020 will be presented virtually. Authors will pre-record their poster talk (5-7 minutes) and will upload it to the virtual conference platform site along with a PDF of their poster. All registered conference participants will have access to the poster and presentation through the conference and content until October 31, 2020. There are Q&A opportunities through a chat function to allow interaction between presenters and participants.

Preliminary information on preparing your poster and poster talk are available at: https://www.iscb.org/ismb2020-general/presenterinfo#posters

Ideally authors should be available for interactive chat during the times noted below:

View Posters By Category

Poster Session A: July 13 & July 14 7:45 am - 9:15 am Eastern Daylight Time
Session B: July 15 and July 16 between 7:45 am - 9:15 am Eastern Daylight Time
July 14 between 10:40 am - 2:00 pm EDT
A framework for identifying representative and differential chromatin state annotations within and across groups of samples
COSI: RegSys COSI
  • Ha Vu, University of California, United States
  • Zane Koch, University of California, United States
  • Petko Fiziev, Illumina, Inc., United States
  • Jason Ernst, UCLA, United States

Short Abstract: Genome-wide maps of epigenetic modifications provide powerful resources for genome annotations. Maps of epigenetics marks have been integrated into widely-used cell-type-specific ‘chromatin-state’ annotations. In many cases, given a group of biologically similar samples, it is desirable to have a chromatin-state annotation that summarizes annotations of those samples. However, determining an effective summary annotation is challenging: there exists no explicit notion of states’ similarities, while in practice some states are more biologically similar than others.
Here, we developed CSREP-- method that accepts a set of chromatin-state annotations from a group of samples and probabilistically estimates the group’s most representative annotation. CSREP trains a logistic regression classifier predicting the chromatin-state assignment of each sample, given the equivalent annotations from other samples, then averaging prediction probabilities. This enables implicitly learning a notion of states’ distances. Additionally, the difference between two groups’ representative chromatin-state maps helps identify differential chromatin regions. We designed a permutation-based test to statistically evaluate those differences.
We applied CSREP to groups of reference epigenomes from Roadmap Epigenomics project. We demonstrate advantages of CSREP compared to a baseline method for this application. We also show CSREP can identify biologically relevant differences between groups with greater power than previous approaches.

A new approach for expression quantitative trait loci (eQTL) identification using HiChIP
COSI: RegSys COSI
  • Sourya Bhattacharyya, La Jolla Institute for Immunology, United States
  • Vivek Chandra, La Jolla Institute for Immunology, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States
  • Pandurangan Vijayanand, La Jolla Institute for Immunology, United States

Short Abstract: Expression quantitative trait loci (eQTL) studies analyze the association of genetic variants and changes in gene expression. Majority of these eQTLs reside in noncoding DNA sequences and are inherited as dense haploblocks making it challenging to assign them to specific genes and, in turn, to disease associations. Conventional cis-eQTL studies such as GTEX are limited to SNPs within ±1 Mb of the TSS of the gene tested. Recent developments in high-throughput chromatin conformation capture techniques show that enhancers can regulate distal target genes through long-range interactions beyond 1Mb due to multiple testing burden. These techniques also indiscriminately test every SNP within 1Mb regardless of any functional evidence as to whether these variants overlap with putative regulatory regions. We addressed both of these problems by developing a new way of eQTL mapping with the help of HiChIP data, namely promoter-interacting eQTLs or pieQTLs, such that only the genetic variants overlapping with active cis-regulatory elements and interacting with the promoter of the gene that will be tested for association. Using CRISPRi and CRISPR-mediated homology-directed recombination (HDR), we have shown that pieQTLs are likely to be enriched for functional/causative QTLs since cis-regulatory elements that overlap pieQTLs function as enhancers for their target eGenes.

A single-cell RNA expression map of human coronavirus entry factors
COSI: RegSys COSI
  • Vikas Bansal, DZNE, Germany
  • Manvendra Singh, Cornell University, United States
  • Cédric Feschotte, Cornell University, United States

Short Abstract: To predict the tropism of human coronaviruses, we profile 28 SARS-CoV-2 and coronavirus-associated receptors and factors (SCARFs) using single-cell RNA-sequencing data from a wide range of healthy human tissues. SCARFs include cellular factors both facilitating and restricting viral entry. Among adult organs, enterocytes and goblet cells of the small intestine and colon, kidney proximal tubule cells, and gallbladder basal cells appear most permissive to SARS-CoV-2, consistent with clinical data. Our analysis also suggests alternate entry paths for SARS-CoV-2 infection of the lung, central nervous system, and heart. We predict spermatogonial cells and prostate endocrine cells, but not ovarian cells, to be highly permissive to SARS-CoV-2, suggesting male-specific vulnerabilities. Early stages of embryonic and placental development show a moderate risk of infection. The nasal epithelium looks like another battleground, characterized by high expression of both promoting and restricting factors and a potential age-dependent shift in SCARF expression. Lastly, SCARF expression appears broadly conserved across human, chimpanzee and macaque organs examined. Our study establishes an important resource for investigations of coronavirus biology and pathology.

Advancing insights in the molecular adaptation & regulation of Leishmania by multi-omic integration
COSI: RegSys COSI
  • Bart Cuypers, Univeristy Of Antwerp, Belgium
  • Jean-Claude Dujardin, Institute of Tropical Medicine, Antwerp, Belgium
  • Pieter Meysman, University of Antwerp, Belgium
  • Ionas Erb, Centre For Genomic Regulation, Spain
  • Wout Bittremieux, Dorrestein Laboratory — University of California San Diego, United States
  • Dirk Valkenborg, University of Hasselt, Belgium
  • Geert Baggerman, University Of Antwerp, Belgium
  • Inge Mertens, University Of Antwerp, Belgium
  • Cedric Notredame, Centre For Genomics Regulation, Spain
  • Malgorzata Domagalska, Institute of Tropical Medicine, Belgium
  • Kris Laukens, University Of Antwerp, Belgium

Short Abstract: Trypanosomatids are protozoan parasites, responsible for a range of diseases in humans and animals, including leishmaniasis (Leishmania spp.), sleeping sickness and Chagas disease. Despite this major relevance and steadily rising number of ‘omic studies, many of the basic molecular regulation mechanisms of these organisms remain only marginally understood, including the process of protein expression. Indeed, unlike other Eukaryotes, these parasites have no RNA-polymerase II promotors and are therefore unable to regulate transcription of individual genes with transcription factors and feedback loops. Here, we show how integrative multi-omic studies can yield new fundamental insights in these molecular adaptation and regulation mechanisms. We demonstrate how gene dosage, and particularly aneuploidy in Leishmania correlates remarkably well with transcript and protein abundance, confirming its presumed adaptational importance. Interestingly, a limited, but significant subset of proteins seemed to be able to compensate for these gene dosage changes and retained a normal abundance. Despite the absence of transcriptional regulation, the parasite showed major changes in transcript and protein abundance between life stages and we reveal its potential control mechanism trough mRNA UTR length regulation. In summary, we report the first study and workflow integrating genomics, transcriptomics and proteomics in Leishmania, applicable to many other non-model species.

An information-theoretic approach to the de novo discovery of DNA structural motifs
COSI: RegSys COSI
  • Michael Wolfe, University of Michigan, United States
  • Peter Freddolino, University of Michigan, United States

Short Abstract: Determining the binding locations of transcriptional regulators is paramount to understanding gene regulation. Recent studies have shown that some DNA binding proteins prefer structural elements rather than a simple recognition of the Watson-Crick face of the bases (Rohs et al., 2009). Using high-throughput prediction of local DNA structure (DNAshapeR, Chui et al.,2016), previous work has combined sequence features with co-occurring DNA structure to improve binding prediction for many transcriptional regulators (Mathelier et al., 2016). Despite this success, only a few approaches have been developed for the de novo discovery of short motifs in DNA structural information from either ChIP-seq (Samee et al.,2019, Yang et al.,2019) or SELEX data (Pal et al., 2019). Here, we present a versatile workflow for discovering short structural motifs explaining the binding preferences of transcriptional regulators. To accomplish this, we couple DNAshapeR with our own information-theoretic approach inspired by the FIRE algorithm for discovering sequence motifs (Elemento et al., 2007). We applied our algorithm to a small sample of proteins with ChIP-Seq data in the ENCODE datasets and found that a pure structure motif outperformed a linear PWM in at least 30% of cases, and made informative contributions to a hybrid sequence/structure motif in many more.

BART Cancer: A web resource for transcriptional regulators in cancer genomes
COSI: RegSys COSI
  • Zack Thomas, University of Virginia, United States
  • Chongzhi Zang, University of Virginia, United States

Short Abstract: Dysregulation of gene expression plays an important role in cancer development. Identifying transcriptional regulators that drive oncogenic gene expression program is a critical task in cancer research. Using Binding Analysis for Regulation of Transcription (BART), a computational method for predicting transcription regulators from a putative target gene set (Wang et al. Bioinformatics 2018), we developed an integrative approach to gain insight into gene regulatory networks in cancer. By integrating over 10,000 gene expression profiling RNA-seq datasets from The Cancer Genome Atlas (TCGA) with over 7,000 ChIP-seq datasets from the Cistrome database and the public domain, we predicted putative transcriptional regulators that are responsible for up- and down-regulated genes in cancer samples compared to normal samples for 15 different cancer types. We built the BART Cancer database, an interactive web resource to display the prediction results and the activities of over 900 transcriptional regulators across cancer types (faculty.virginia.edu/zanglab/bartcancer/). BART Cancer provides insights to epigenetic and transcriptional regulation of cancer gene expression and can be a useful resource for the cancer research community.

Base-resolution predictive models of genomic transcription factor binding profiles can learn the thermodynamics of DNA-protein interactions
COSI: RegSys COSI
  • Amr Alexandari, Stanford University, United States
  • Connor Horton, Stanford University, United States
  • Eileen Li, Stanford University, United States
  • Avanti Shrikumar, Stanford University, United States
  • Polly Fordyce, Stanford University, United States
  • Anshul Kundaje, Stanford University, United States

Short Abstract: Transcription factors (TFs) bind genomic DNA in a sequence specific manner. ChIP-seq/exo experiments have been widely used to obtain genome-wide TF binding profiles. However, it has been challenging to relate these in-vivo binding measures to thermodynamic affinities that are typically estimated using in-vitro experiments. Here, we show that neural networks trained to accurately map genomic DNA sequences to base-resolution binding profiles of Pho4 and Cbf1 from PB-exo and ChIP-exo experiments in yeast can learn binding site flanking sequence preferences that strongly correspond to in-vitro binding energy measurements from micro-fluidic platforms. While training on data from purified in-vitro assays (PB-exo) is ideal, training on in-vivo (ChIP-exo) data still largely captured the contributions of flanking regions to binding affinities. Predictive motif representations distilled from the models identified flanking sequence preferences that strongly match those derived from in-vitro binding energies. Finally, using the model for in-silico experiments helped shed some light on favorable and unfavorable repeat sequences near binding sites that agreed with experimental measurements. Our framework enables a unified analysis of in-vitro and in-vivo TF binding assays via comprehensive in-silico interrogation of deep learning oracles.

Biologically-relevant transfer learning improves transcription factor binding prediction by deep learning models
COSI: RegSys COSI
  • Gherman Novakovsky, Centre for Molecular Medicine and Therapeutics, University of British Columbia, Canada
  • Manu Saraswat, Centre for Molecular Medicine and Therapeutics, University of British Columbia, Canada
  • Oriol Fornes Crespo, Centre for Molecular Medicine and Therapeutics, University of British Columbia, Canada
  • Wyeth Wasserman, Centre for Molecular Medicine and Therapeutics, University of British Columbia, Canada

Short Abstract: Identifying the genomic locations where transcription factors (TFs) bind is key to
understanding gene regulation. ChIP-seq is an experimental method for the high-throughput
identification of protein-DNA interactions in vivo. However, it is not possible to perform
ChIP-seq for each of the ~1,600 human TFs in every cell.
Driven by recent computational advances, deep learning approaches trained on ChIP-seq
data, have emerged as powerful tools for
predicting TF binding. However, their performance is compromised by the limited availability
of ChIP-seq data for certain TFs and cell types. Transfer learning (TL)—storing the
knowledge acquired by solving a problem and applying it to solve a different but related
problem—has been shown to reduce the amount of data required for training CNN models
and improve their performance. For instance, TL has been successfully applied to different
biological tasks such as reconstructing gene regulatory networks. Yet, no in-depth study on the application of TL to TF binding prediction has
been performed. Here, we explore TL using different types of biologically-relevant prior
knowledge. Our results suggest that TL is improved by prior knowledge of TFs with similar
DNA-binding mechanisms as the target TF (i.e. the TF for which the model is trained).

ChIP-R: Assembling reproducible sets of ChIP-seq and ATAC-seq peaks from multiple replicates
COSI: RegSys COSI
  • Rhys Newell, Queensland University of Technology, Australia
  • Richard Pienaar, The University of Queensland, Australia
  • Brad Balderson, The University of Queensland, Australia
  • Michael Piper, The University of Queensland, Australia
  • Alexandra Essebier, The University of Queensland, Australia
  • Rhys Newell, The University of Queensland, Australia

Short Abstract: Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the primary protocol for detecting genome-wide DNA-protein interactions, and therefore a key tool for understanding transcriptional regulation. A number of factors, including low specificity of antibody and cellular heterogeneity of sample, may cause "peak" callers to output noise and experimental artefacts. Statistically combining multiple experimental replicates from the same condition could significantly enhance our ability to distinguish actual transcription factor binding events, even when peak caller accuracy and consistency of detection are affected.
We developed and used the principles of the rank-product test to statistically evaluate the reproducibility from any number of ChIP-seq experimental replicates. We demonstrate over a number of benchmarks that our adaptation "ChIP-R" (pronounced 'chipper') performs as well as or (more often) significantly better than comparable approaches on recovering transcription factor binding sites in ChIP-seq peak data, including the practice recommended by ENCODE. We also show ChIP-R can be extended to evaluate ATAC-seq peaks finding reproducible peak sets regardless of sequencing depth. ChIP-R decomposes peaks across replicates into "fragments" which either form part of a peak in a replicate, or not.
We show that by re-analysing existing data sets, ChIP-R reconstructs reproducible peaks from fragments with enhanced biological enrichment.

Chromatin accessibility and transcription factor activity estimation on single cells
COSI: RegSys COSI
  • Zhijian Li, RWTH Aachen, Germany
  • Kuppe, UK Aachen, Germany
  • Rafael Kramann, UK Aachen, Germany
  • Ivan G. Costa, RWTH Aachen University, Germany

Short Abstract: We propose scOpen, a computational method for quantifying the open chromatin status of regulatory regions from single cell ATAC-seq (scATAC-seq) experiments. scOpen is based on positive-unlabelled learning of matrices and estimates the probability that a region is open at a given cell by mitigating the sparsity of scATAC-seq matrices. We demonstrate that scOpen improves all down-stream analysis steps of scATAC-seq data as clustering, visualisation and chromatin conformation. Moreover, we show the power of scOpen and single cell-based footprinting analysis (scHINT) to dissect regulatory changes in the development of fibrosis in the kidney.

Convolutional neural network regression improves prediction of effects of common variants on gene regulatory activity
COSI: RegSys COSI
  • Easwaran Ramamurthy, Carnegie Mellon University, United States
  • Badoi Phan, Carnegie Mellon University, United States
  • Andreas Pfenning, Carnegie Mellon University, United States

Short Abstract: Understanding the functional effects of non-coding variants associated with human phenotypes remains an open challenge. Recent studies have trained convolutional neural networks (CNN) and support vector machines (SVM) to accurately predict active regulatory elements from genomic sequence, enabling querying of non-coding variant effects in silico. However, these have commonly been trained as binary classifiers, which learn to differentiate sequence examples underlying regulatory peak annotations from background genomic examples. We hypothesize that classifiers are limited in their ability to predict variant effects, since most common variants are unlikely to fully deplete regulatory activity. Rather, they tend to have smaller effects on regulatory activity. Regression models trained on absolute peak strength can learn these subtle differences, enabling better prediction of variant effects. Indeed, we validate this by showing that CNN regression models trained on open chromatin profiles of GM12878 lymphoblastoid cells (LCL) accurately predict the direction of variant effects in an independent massively parallel reporter assay (MPRA) conducted in LCLs. On the contrary, CNN classifiers trained on the same data struggle to predict variant effects in MPRA. Further, we use these results as a framework and apply CNN regression models to interpret effects of Alzheimer’s Disease associated variants in relevant cell types.

CUT&RUNTools: a flexible pipeline for CUT&RUN processing and footprint analysis
COSI: RegSys COSI
  • Qian Zhu, Dana Farber Cancer Institute, United States
  • Nan Liu, Boston Children's Hospital, United States
  • Stuart Orkin, Boston Children's Hospital, United States
  • Guo-Cheng Yuan, Dana Farber Cancer Institute, United States

Short Abstract: We introduce CUT&RUNTools as a flexible, general pipeline for facilitating the identification of chromatin-associated protein binding and genomic footprinting analysis from antibody-targeted CUT&RUN primary cleavage data. CUT&RUNTools extracts endonuclease cut site information from sequences of short-read fragments and produces single-locus binding estimates, aggregate motif footprints, and informative visualizations to support the high-resolution mapping capability of CUT&RUN. CUT&RUNTools is available at bitbucket.org/qzhudfci/cutruntools/.

dcHiC: Differential Compartment Analysis of Hi-C datasets.
COSI: RegSys COSI
  • Jeffrey Wang, La Jolla Institute for Immunology, United States
  • Abhijit Chakraborty, La Jolla Institute for Immunology, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States

Short Abstract: Principal Component Analysis (PCA) of Hi-C data provides a critical understanding of genome organization. The first principal component-score determines A/B compartmentalization and signifies the underlying chromatin-state. Although determining the component-score per sample is simple, a comparison of this score across multiple Hi-C samples is not straightforward. It’s a limitation for comparative analysis of genome organization across cell-types and conditions with available Hi-C datasets. Here, we introduce a systematic approach ‘dcHiC’ (differential compartment analysis of Hi-C) to measure principal component-scores of Hi-C profiles and identify differential compartments across datasets.

dcHiC employs Hierarchical Multiple Factor Analysis (HMFA) to balance the importance of multiple Hi-C datasets before performing PCA. This normalizes component-scores and helps to identify significant compartment changes across the genome. We also introduce compartmental Gene Set Enrichment Analysis or ‘cGSEA’ to find biologically relevant differential compartments across datasets. We used dcHiC to compare a total of 57 human Hi-C datasets, encompassing various biological conditions. We observed that dcHiC identified the least differences among replicate experiments and detected previously validated compartmental changes during cellular-differentiation. Compartmentalization is a critical process and we believe that our framework will help to understand its role in genome organization and downstream effects in a systematic manner.

De novo inference of transcription factor motifs from single cell RNA-seq data
COSI: RegSys COSI
  • Ariel Madrigal, McGill University, Canada
  • Hamed Najafabadi, Mcgill University, Canada

Short Abstract: Transcription factors (TFs) control cell differentiation and contribute to cellular heterogeneity. Current methods to identify TFs in single cell RNA-seq (scRNA-seq) data rely on previously identified TF motifs. However, nearly 26% of the human transcription factors remain with no-known DNA binding motifs. Here, we propose a computational framework that infers de novo transcription factor motifs using scRNA-seq data. In our method, we exploit the fact that transcriptome data is highly correlated and identify latent variables in the gene expression matrix that represent TFs activities. For this, we generated a matrix that represents the frequency of k-mers (all possible DNA sequences of length k) in gene promoters. Using an integrated model that considers gene expression, a set of latent variables and the k-mer matrix, we detect k-mers that correlate with a specific TF signature. By performing clustering of the k-mer sequences and further assembly, we can reconstruct TFs motifs. We are benchmarking our method with simulated data and with previously characterized TF motifs. Furthermore, we will apply our framework to recent scRNA-seq data derived from the Human Cell Atlas/Landscape Projects, and identify de novo TFs motifs in distinct tissues and cell types across the human body.

Deep-EMSCAN: Deep Learning Approach to Recover Combination of Biologically Significant DNA Motifs
COSI: RegSys COSI
  • Tarun Bonu, Monash University, Australia
  • Sonika Tyagi, Monash University, Australia

Short Abstract: DNA motifs are short (5-20 bp) recurring patterns that are presumed to have a biological function. Searching for these small patterns in large genomic data (up to billions bp) is very challenging. Several motif searching methods are available and have their own assumptions and limitations. Here we present a pipeline that implements multiple DNA substitution models to simulate the evolution of these motifs along with using the phylogenetic shadowing approach. The Evolutionary Motif Scan pipeline (EMSCAN) scores an individual motifs instance by guessing an appropriate combination of base substitution models and phylogeny to improve the specificity of the prediction. DNA motifs may work in collaboration with one or more other motifs and it is a computationally expensive task to find various permutations. We further built a Convolutional Neural Network (CNN) model (also known as Deep-EMSCAN) to predict the co-occurrence of motifs. The motifs are coded as images and relevant filters were placed between convolutions. The pairs of motifs which hold significance get activated and by max-pooling this layer, a combination is identified. We have applied this model to successfully locate DNA/RNA protein binding sites in the data generated through ChIP-seq, RIP-seq, and whole-genome sequencing experiments. The results will be presented.

Detection of Cell-type-specific Risk-CpG Sites in Epigenome-wide Association Studies
COSI: RegSys COSI
  • Xiangyu Luo, Renmin University of China, China
  • Can Yang, The Hong Kong University of Science and Technology, Hong Kong
  • Yingying Wei, The Chinese University of Hong Kong, Hong Kong

Short Abstract: In epigenome-wide association studies, the measured signals for each sample are a mixture of methylation profiles from different cell types. Previous approaches to the association detection can only claim whether a cytosine-phosphate-guanine (CpG) site is associated with the phenotype or not at aggregate level and always suffer from low statistical power. Here, we propose a statistical method, HIgh REsolution (HIRE), which not only substantially improves the power of association detection at aggregate level as compared to the existing methods but also enables the detection of risk-CpG sites for individual cell types.

Differences in DNA methylation profiles in pediatric hepatoblastoma tumors
COSI: RegSys COSI
  • Rachel M. Moss, University of Minnesota, United States
  • Lauren J. Mills, University of Minnesota, United States
  • Erin L. Marcotte, University of Minnesota, United States
  • Logan G. Spector, University of Minnesota, United States
  • Jenny N. Poynter, University of Minnesota, United States

Short Abstract: Hepatoblastoma (HB) accounts for over 80% of all liver cancer diagnosed in children under 15 years old in the United States. Although it is a rare cancer, rates are increasing faster and the survival rate is lower than most childhood cancers. The etiology of HB development is still unknown, but aberrant DNA methylation has been implicated in the pathogenesis of multiple types of cancer, including HB. We measured DNA methylation in samples totaling 88 tumor and 39 normal liver tissues in a study to identify methylation differences using the Illumina HumanMethylation450 BeadChip (includes >480,000 CpG loci). Differentially methylated regions (DMRs) were identified between control and patient samples using the bumphunter algorithm in the minfi package. We used STRING v11 (string-db.org) to identify pathway enrichment in the closest genes to the DMRs using the top 5% based on difference in beta values in these regions. We observed 15,470 DMRs (FWER < 0.05) with association of 5877 unique genes. Biological process and pathway analysis indicate a role of developmental biological processes and cardiomyopathy-related pathways. Overall, we observed distinct differences in gene-specific methylation between the groups. These altered methylation patterns may point to specific exposures or biological mechanisms impacting the development of HB.

Discovery of biased orientations of regulatory motifs affecting transcription of human genes and including known insulators
COSI: RegSys COSI
  • Naoki Osato, Osaka University, Japan

Short Abstract: The overall picture of proteins associated with chromatin interactions and insulator function is still unclear. Here I describe a systematic and comprehensive approach to discover DNA binding motif sequences of transcription factors (TFs), affecting the interaction between enhancers and promoters of genes and their expression by the insulator function of the TFs. This analysis identified 96 biased orientations of DNA motifs of TFs that affected the expression level of putative transcriptional target genes significantly in monocytes, T cells, HMEC and NPC in common, and included known TFs involved in chromatin interactions and insulator function such as CTCF, cohesin (RAD21 and SMC3), YY1 and ZNF143. To confirm the predicted DNA motifs, first, I compared with chromatin interaction data. Second, using gene expression data among 53 tissues, 43 (72%) forward-reverse and 40 (80%) reverse-forward orientations of DNA motifs showed significantly reduced correlation in expression level of nearby genes separated by the motif sites. Some DNA motif sites of the DNA motifs of TFs were found between nearby genes not including DNA motif sites of the known TFs and between low correlation of expression level of nearby genes. These analyses suggest that the DNA motifs are associated with insulator function.

Dynamics of the 4D genome during in vivo mouse erythroid lineage specification and differentiation
COSI: RegSys COSI
  • Robert A. Beagrie, MRC Weatherall Institute of Molecular Medicine, United Kingdom
  • A. Marieke Oudelaar, MRC Weatherall Institute of Molecular Medicine, United Kingdom
  • Matthew Gosden, MRC Weatherall Institute of Molecular Medicine, United Kingdom
  • Douglas R. Higgs, MRC Weatherall Institute of Molecular Medicine, United Kingdom
  • Jim R. Hughes, MRC Weatherall Institute of Molecular Medicine, United Kingdom

Short Abstract: Mammalian gene expression patterns are controlled by regulatory elements, which interact within Topologically Associating Domains (TADs). The relationship between activation of regulatory elements, formation of structural chromatin interactions and gene expression during development is unclear. We use in vivo mouse erythroid differentiation as a model to study these relationships at high spatial and temporal resolution. Integrated analysis of chromatin accessibility and single-cell expression data shows that regulatory elements gradually become accessible within pre-existing TADs during early differentiation. We use Tiled-C, a new low-input Chromosome Conformation Capture (3C) technique, to examine the subsequent structural re-organization within the TAD and formation of specific contacts between enhancers and promoters. Our high-resolution data show that these enhancer-promoter interactions are not established prior to gene expression, but formed gradually during differentiation, concomitant with progressive upregulation of gene activity. Together, these results provide new insight into the close, interdependent relationship between chromatin architecture and gene regulation during development.

Dysregulation of histone acetylation in Parkinson's disease brain
COSI: RegSys COSI
  • Gia T Tran, Neuro-SysMed, Department of Neurology, Haukeland University Hospital, Norway
  • Janani Sandaresan, Neuro-SysMed, Department of Neurology, Haukeland University Hospital, Norway
  • Christian Dölle, Neuro-SysMed, Department of Neurology, Haukeland University Hospital, Norway
  • Lilah Toker, Department of Clinical Medicine, University of Bergen, Norway
  • Kristoffer Haugarvoll, Department of Neurology, Haukeland University Hospital, Bergen, Norway
  • Charalampos Tzoulis, Department of Clinical Medicine, University of Bergen, Norway
  • Gonzalo S. Nido, University of Bergen, Norway

Short Abstract: Parkinson's disease (PD) is a complex neurodegenerative disorder of largely unknown etiology. While several genetic risk factors have been identified, the involvement of epigenetics in the pathophysiology of the disease is mostly unaccounted for. The tight coupling between mitochondrial dysfunction, a hallmark of the disease, and protein acetylation, prompted us to assess histone acetylation in PD brain. We conducted a histone acetylome-wide association study in PD, using brain tissue from two independent cohorts of cases and controls. Immunoblotting analyses revealed hyperacetylation at several histone sites in PD, with the most prominent change observed for H3K27, a marker of active promoters and enhancers. ChIP-seq analysis further indicated that H3K27 hyperacetylation in PD is a genome-wide phenomenon, with a strong predilection for genes implicated in the disease, including SNCA, PARK7, PRKN and MAPT. Integration of ChIP-seq with transcriptomics data revealed that the correlation between promoter H3K27 acetylation and gene expression is attenuated in individuals with PD, with the strongest effects observed for nuclear-encoded mitochondrial genes. Taken together, our findings point out to interplay between aberrant mitochondrial function and dysregulation of histone acetylation in PD and indicate that dysregulation of histone acetylation, plays an important role in the pathophysiology of the disease.

Enhancer prediction in the human genome by probabilistic modeling of the chromatin feature patterns
COSI: RegSys COSI
  • Maria Osmala, Aalto University, Finland
  • Harri Lähdesmäki, Aalto University, Finland

Short Abstract: The regulatory regions called enhancers are difficult to locate. Enhancers bind transcription factors and are occupied by nucleosomes with modified histones, features that are quantified by ChIP-seq assay. The ChIP-seq data is used as an input for unsupervised and supervised machine learning methods developed for enhancer prediction. However, the predictions made by different methods vary, they do not generalize between cell lines, and the choice of training data can affects the results. Moreover, the current methods do not utilize the shape of the ChIP-seq signal profiles efficiently.

We have developed a classification tool for enhancer prediction. The shape of the data density around the positive and negative examples of enhancers is probabilistically modeled. The data originates from two ENCODE cell types. The predicted enhancers are computationally validated based on DNA-binding protein binding sites. We compare our enhancer predictions to those obtained by ChromHMM and RFECS. We study the effect of choosing the non-enhancer training data. Our method predicts genome-wide enhancers which are not identified by RFECS and ChromHMM, but which still validate as enhancers. The choice of training data can have a huge effect. The choice of different parameters affects the final results.

EpiRegio: Analysis and retrieval of regulatory elements linked to genes
COSI: RegSys COSI
  • Nina Baumgarten, Goethe University Frankfurt, Germany
  • Siva Karunanithi, Goether University Frankfurt, Germany
  • Dennis Hecker, Goether University Frankfurt, Germany
  • Markus List, Technical University of Munich,, Germany
  • Florian Schmidt, A-Star Institute Singapore, Singapore
  • Marcel Schulz, Goethe University Frankfurt, Germany

Short Abstract: Research on gene regulation is continuously expanding our understanding of how cellular identity and function are orchestrated. A current challenge is to interpret non-coding regions, so called Regulatory EleMents (REMs), and their role in transcriptional regulation of possibly distant target genes. Identifying REMs is difficult, as there is no method yet to locate them with absolute certainty. An additional challenge is to reliably identify the target genes of the regulatory regions, which is an essential step in understanding their impact on gene expression.
We developed the EpiRegio web server, a resource of REMs and their target genes, identified by analyzing variations in gene expression across samples in combination with chromatin accessibility profiles. EpiRegio incorporates data for various human primary cell types and tissues, providing an integrated view of REMs in the genome. It allows the analysis of genes and their associated REMs, including the REM’s activity and its estimated cell type-specific contribution to its target gene’s expression. Moreover, it is possible to explore genomic regions for their regulatory potential, investigate overlapping REMs and by that the dissection of regions of large epigenomic complexity. EpiRegio allows programmatic access through a REST API and is freely available at epiregio.de/.

Estimating real-time transcriptional dynamics during differentiation using single-cell RNA-Seq data
COSI: RegSys COSI
  • Masato Ishikawa, The University of Tokyo, Japan
  • Hisanori Kiryu, The University of Tokyo, Japan

Short Abstract: Single-cell RNA-Seq provides a more detailed picture of the transcriptional dynamics of the cell differentiation process. By arranging the expression profiles of each cell in order of the degree of differentiation (pseudotime), we can obtain time-series gene expression data in the pseudotime. However, the pseudotime does not necessarily reflect the real-time. Therefore, time-series data in the pseudotime cannot be used to (i) analyze the speed of change in expression, and (ii) compare the time between differentiation paths. Accordingly, we developed an algorithm to correct the pseudotime to the real-time based on the recently developed method of RNA velocity. Using simulated and real data, we validated that our algorithm can estimate a time closer to real-time than the pseudotime estimated by existing methods. The analysis of the speed of change in expression suggests that the time-series data corrected by our algorithm can remove the bias in the analysis, which is included when using pseudotime. We also applied our algorithm to the data with two differentiation paths and confirmed that our algorithm can estimate the time that reflects the known order of differentiation of each path. Thus, our algorithm is a promising method for further cell differentiation analysis leveraging the real-time.

Functional organization and compartmentalization of the accessible human genome
COSI: RegSys COSI
  • Alexander Muratov, Altius Institute for Biomedical Sciences, United States
  • Eric Rynes, Altius Institute for Biomedical Sciences, United States
  • Alex Reynolds, Altius Institute for Biomedical Sciences, United States
  • Athanasios Teodosiadis, Altius Institute for Biomedical Sciences, United States
  • John Stamatoyannopoulos, Altius Institute for Biomedical Sciences, United States
  • Wouter Meuleman, Altius Institute for Biomedical Sciences, United States

Short Abstract: A fundamental question in Biology pertains to how the regulatory genome is organized, and what the functional implications of this organization are. We created maps of DNase I hypersensitive sites (DHSs) from 733 human biosamples and integrated these to delineate and numerically index ~3.6 million DHSs encoded within the genome, providing a common coordinate system for regulatory DNA. We show that the complex patterning of DHSs across biosamples can be captured by a simple regulatory vocabulary, providing a comprehensive and interpretable per-element annotation of the human regulatory. The combination of high-precision DHSs and regulatory vocabularies markedly concentrates disease- and trait-associated non-coding genetic signals along the genome and across cellular compartments. By considering DHS annotations in a larger genomic context, regulatory vocabularies enable comprehensive regulatory annotation of genes. Beyond individual genes, we identify a hitherto underappreciated degree of cellular condition specific compartmentalization of regulatory signal. This not only reveals the extent of regulatory domains of genes, but also provides a novel view of regulatory genome organization complementary to chromosome conformation capture assays. Taken together, our results provide a common coordinate system and annotation for human regulatory DNA, which in turn enable the quantitative and multi-scale interpretation of genetic and regulatory signals.

Gene regulation of Hedgehog interacting protein (HHIP) in Chronic Obstructive Pulmonary Disease
COSI: RegSys COSI
  • Alba Mayra Padilla, Universidad Anáhuac Querétaro Circuito Universidades I, Fracción 2, El Marqués, Querétaro, 76246, México, Mexico
  • Lucia Ramirez-Navarro, Laboratorio Internacional de Investigación sobre el Genoma Humano, Mexico
  • Walter Santana-Garcia, Laboratorio Internacional de Investigación sobre el Genoma Humano, France
  • Alejandra Medina-Rivera, International Laboratory for Human Genome Research, Universidad Nacional Autónoma de México, Querétato, México, Mexico
  • Ana Beatriz Villaseñor-Altamirano, Laboratorio Internacional de Investigación sobre el Genoma Humano, UNAM, Juriquilla, Mexico

Short Abstract: Chronic obstructive pulmonary disease (COPD) is estimated to become the third cause of mortality by 2030 (WHO., 2020). In PulmonDB (Villaseñor-Altamirano et al., 2020), a transcriptome database for pulmonary diseases, we found several genes with discordant gene expression pattern in COPD patients across studies, among these genes we found Hedgehog interacting protein(HHIP), a gene previously associated with COPD (Obeidat et al., 2015). We confirmed the discordant gene expression pattern in 13 articles reporting HHIP gene expression in COPD.

In this project, we aim to assess if HHIP expression could be affected by regulatory variants. In order to do so, we retrieved genetic variants in the vicinity of HHIP loci reported in the literature. We found eQTLs with opposite effects on HHIP gene expression, 3 associated to overexpression (Obeidat et al., 2015; Wang et al and 2013 Fawcett et al., 2019) and 7 to downregulation(Zhou et al., 2013, Tam et al., 2019, Bartholo et al., 2019, Woo & Sang., 2015 & Lao et al., 2015).To assess if these genetic variants could be affecting transcription factor binding interactions we used RSAT Variation-tools with non-redundant Jaspar motifs, finding potential binding sites modifications that could explain the variation of gene expression across experiments.

Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments
COSI: RegSys COSI
  • Chris Jackson, NYU, United States
  • David Gresham, New York University, United States
  • Richard Bonneau, New York University, United States

Short Abstract: All living things regulate how they express their genes in response to the challenges they face in their environment. We need to know how genes are connected to their regulators in order to understand how an organism functions and consequently how to modify or engineer it. Single-cell RNA sequencing is a powerful new tool to measure the gene expression of individual cells out of a large population of cells. By introducing RNA barcodes into an array of mutants, we can obtain gene expression for hundreds of separately engineered strains grown in a single flask. We have chosen to do this on the model organism Saccharomyces cerevisiae (yeast), deleting individual transcription factors and growing the mutant array in a variety of environmental conditions. We have benchmarked regularized regression-based gene regulatory network inference on this single-cell yeast data using a gold standard network derived from the extensive literature available about yeast gene regulation. We found that it performs well with no changes to the underlying model assumptions. Additionally, we note that using single-cell data allows us to learn regulatory relationships between things that are heterogeneous within most samples (like the cell cycle) and things that are similar within most samples (like metabolism).

Genome-Wide Characterization of the Regulatory Relationships of Cell-Type Specific Enhancer-Gene Links
COSI: RegSys COSI
  • Caitlin Mills, University of Southern California, United States
  • Anushya Muruganujan, University of Southern California, United States
  • Dustin Ebert, University of Southern California, United States
  • Paul D. Thomas, University of Southern California, United States
  • Juan Pablo Lewinger, University of Southern California, United States
  • Huaiyu Mi, Keck School of Medicine at University of Southern California, United States

Short Abstract: Enhancers are powerful and versatile agents of cell-type specific gene regulation, many of which are thought to play key roles in human disease. Enhancers are short DNA elements that function primarily via transcriptional regulation of their target genes. These enhancer-gene links form the basis of a complex network of enhancer activity and function. Despite their involvement in disease and the establishment of cell identity during development, most enhancer-gene links remain unknown. We introduce a new database of predicted enhancer-gene links, incorporating publicly available experimental data from ChIA-PET, eQTL, and Hi-C assays across 78 cell and tissue types to link 449,627 enhancers to 17,643 protein-coding genes. These enhancer-gene links are available through the PANTHER online database where the user may easily access the evidence for each enhancer-gene link, as well as query by gene and enhancer. We have also attributed cell-type specific scores to these enhancer-gene links using various machine learning strategies. Published experimental results from the literature were gathered and used to train and statistically validate two-class classification models, which were able to correctly classify enhancer-gene links 87% of the time in multiple cell types, allowing us to provide high quality predictions for cell type-specific enhancer-gene regulatory relationships.

HiCSV: Complex structural variation detection from Hi-C data
COSI: RegSys COSI
  • Abhijit Chakraborty, La Jolla Institute for Immunology, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States

Short Abstract: Genomic rearrangements, as one of the driving forces behind the development of cancer, cause structural changes to the human genome. This alters the three-dimensional (3D) genome hierarchy, and eventually effects the epigenetic-landscape. Among a variety of rearrangement signatures, complex structural variations like chromothripsis play a crucial role during cancer development and have shown to reshape the genome organization. Hi-C initially developed for decoding the 3D structure, provides an opportunity to identify the locations and investigate the impact of chromothripsis on genome organization. Characterization of chromothriptic events remains a challenge, and there is no systematic method to identify these events from Hi-C data.

We introduce a novel approach ‘HiCSV’ (Hi-C Complex Structural Variation) to detect and correct complex rearrangements at restriction-enzyme cut-site resolution from Hi-C data. We validated ‘HiCSV’ results using breakpoints identified from whole-genome sequencing on chromothripsis events in TC1-Hsa21 and HCC1954 cell-lines. We observe that ‘HiCSV’ breakpoints have a corresponding change of folding pattern at rearranged locations in Hi-C data, suggesting the impact of chromothripsis on genome organization. Given the prevalence of chromothripsis in different cancers, identification and correcting the interaction pattern from Hi-C maps will be crucial to understand their role in reshaping the genome organization in cancer.

Identification of novel virulence factors in melioidosis pathogen Burkholderia pseudomallei and their targets in human using advanced bioinformatics
COSI: RegSys COSI
  • Naveen Duhan, Utah State University, United States
  • Matthew Lister, Utah State University, United States
  • Cristian Loaiza, Center for Integrated BioSystems, Utah State University, United States
  • Rakesh Kaundal, Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, United States

Short Abstract: Burkholderia pseudomallei is a pathogenic bacteria that can attach itself to both plants and animals. When exposed to humans, B. pseudomallei can cause melioidosis, a chronic, treatment-resistant, and potentially fatal disease. This dangerous resilient pathogen has been recognized by the US government as a biothreat agent. Despite its importance, only a few effector proteins have been functionally characterized and little information regarding the host-pathogen Protein-Protein Interactions (PPI) of this system.
In this study, we made an in silico exploration of the host-pathogen interaction network between B. pseudomallei and human. After collecting the proteome of two strains (prototypical, and highly virulent), computational predictions were made following the homology-based interolog and the domain-domain interaction models. 14,131 HPIs were found to be unique between the prototypical strain and human, compared to 3,043 HPIs between the highly virulent strain and human. The proteins identified as interacting then underwent subcellular localization and GO enrichment. Analysis showed that most of the pathogen proteins are targeting human proteins inside the cytoplasm and nucleus. We also discovered the host targets for the drug-related pathogen proteins and identified the proteins that forms T3SS and T6SS in B. pseudomallei.

Improved enhancer discovery in Drosophila and other insects
COSI: RegSys COSI
  • Hasiba Asma, State University of New York at Buffalo, United States
  • Marc Halfon, State University of New York at Buffalo, United States
  • Chad Jaenke, University of Dayton, United States
  • Michael Weinstein, University of Dayton, United States
  • Thomas Williams, University of Dayton, United States

Short Abstract: Enhancer identification is critical for understanding transcriptional regulation. We previously developed a computational method, SCRMshaw, that discovers CRMs with a high rate of true-positive predictions. SCRMshaw uses known D.melanogaster CRMs as training-data to facilitate CRM discovery not just in Drosophila but in diverse holometabolous insects. We present here 3-approaches for increasing SCRMshaw’s effectiveness.
Firstly, we developed pCRMeval, a comprehensive pipeline for in-silico evaluation of SCRMshaw. It compares prediction results with validated CRMs to calculate recovery of true-CRMs. pCRMeval also assess the performance of specific training-set.
Secondly, we are developing a method to assign each predicted CRM a “Weighted-Comparative-Confidence(WCC) score”. It compares predictions from closely-related species to determine whether CRMs are predicted in conserved genomic-location. This will give a confidence measure to each prediction.
Lastly, in order to identify CRMs within a GRN for Drosophila pigmentation-trait, we used SCRMshaw on a small training-set. Empirical testing of 18-predictions revealed 10 new CRMs with same activity. These new CRMs were combined with the original and the updated training-set used for second-round of SCRMshaw. The top-predictions from this set did not include false-positives sequences, but did contain all previous true-positives. This suggests that iterative-approaches can serve to augment weak training-sets to improve true-positive:false-positive ratios.

Inferring Gene Co-expression Networks from Single Cell Gene Expression Data
COSI: RegSys COSI
  • Wei Vivian Li, Rutgers, The State University of New Jersey, United States
  • Yanzeng Li, Rutgers, The State University of New Jersey, United States

Short Abstract: A system-level understanding of the regulation and coordination mechanisms of gene expression
is essential to understanding the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, as a means to study transcriptome-wide gene expression measurement at single-cell resolution, it is now possible to investigate gene interactions in a cell-type-specific manner. However, the high level of sparsity in single-cel data makes it challenging to accurately infer gene co-expression networks in single cells. Here we propose the scLink method, which uses statistical network modeling to understand the co expression relationships among genes and to construct functional gene networks from single-cell gene expression data. We use both simulation and real data studies to demonstrate the advantages of scLink and its ability to improve single-cell gene network analysis.

Integrated analysis of multi-omics data to identify transcriptional regulators of cancer progression
COSI: RegSys COSI
  • Saba Ghaffari, University of Illinois at Urbana-Champaign, United States
  • Remington Schmidt, Mayo Clinic, United States
  • Steven M. Offer, Mayo Clinic, United States
  • Saurabh Sinha, University of Illinois at Urbana-Champaign, United States

Short Abstract: In this study, we developed a computational framework to characterize the major regulatory mechanisms underlying colorectal cancer (CRC) progression by simultaneous analysis of dynamic gene expression and histone modification profiles that have been collected from SW480 cell lines selected for various levels of invasiveness. We built upon our previously developed probabilistic graphical model for integration of multi-omics data, called pGENMi, to identify TFs involved in CRC metastasis. Our framework considers differential expression of each gene along with evidence of that gene being under the regulatory influence of a specific TF. Data are aggregated across all genes to assign a probabilistic significance to each TF. Importantly, the cis-regulatory evidence is based on progression-associated changes in histone marks within a TF’s ChIP peak. Disruption of JUND, one of the highest ranked TFs in our analyses, confirmed involvement in the CRC invasive phenotype. Moreover, our framework identified candidate genes predicted to mediate the effects of TFs on cancer progression, thus enabling the reconstruction of the underlying gene regulatory network (GRN). We used a gene set derived from this predicted GRN as a signature to subtype CRC patient profiles from TCGA and found the subtypes to have significantly different survival outcomes.

Integrated Omics Modeling of Transcriptional Regulation in Medulloblastoma Subtypes
COSI: RegSys COSI
  • Owen Chapman, University of California San Diego, United States
  • Tobias Ehrenberger, Massachusetts Institute of Technology, United States
  • Tenley Archer, Eli and Edythe Broad Institute of MIT and Harvard, United States
  • Maxwell Gold, Massachusetts Institute of Technology, United States
  • Filip Mundt, Eli and Edythe Broad Institute of MIT and Harvard, United States
  • Miriam Adam, Massachusetts Institute of Technology, United States
  • Clarence Mah, University of California San Diego, United States
  • Karsten Krug, Eli and Edythe Broad Institute of MIT and Harvard, United States
  • Sahaana Chandran, Salk Institute for Biological Studies, United States
  • Jesse Dixon, Salk Institute for Biological Studies, United States
  • Scott Pomeroy, Boston Childrens Hospital, United States
  • Ernest Fraenkel, Massachusetts Institute of Technology, United States
  • Jill Mesirov, University of California San Diego, United States
  • Lukas Chavez, University of California San Diego, United States

Short Abstract: Medulloblastoma (MB) is a relatively rare tumor of the developing cerebellum. Current treatment options carry substantial risk for lifelong cognitive and neurological impairment, and few patient tumors are targetable by known molecular therapies. Consequently, there is a pressing need to elucidate molecular mechanisms underlying MB. To identify gene regulatory circuitries that drive molecular variation within MB, we have mapped accessible chromatin and 3D chromosome conformation in 24 and 13 MB tumors using ATAC-seq and Hi-C respectively. We identify subtype-specific transcriptional regulatory chromatin regions, including a subset of regions not identifiable by the enhancer mark H3K27ac. We associate regulatory regions to their target gene promoters by correlating transcription with ATAC-seq signal strength, and confirm these relationships using the newly generated Hi-C data. Using an unpublished variant of the GSEA algorithm, which we adapted for an enrichment analysis of non-coding regulatory elements instead of genes, we identify transcription factors and master regulators with binding sites specific to each MB subtype. By associating these master regulators with their transcriptional targets, we derive models of regulatory circuitry in MB which implicate central mechanisms of MB pathogenesis and may inform future therapeutic targeting.

Learning global patterns of epigenetic variation across individuals
COSI: RegSys COSI
  • Jennifer Zou, UCLA, United States
  • Jason Ernst, UCLA, United States

Short Abstract: Many studies have identified variation across individuals in transcription factor binding, gene expression, histone modifications, and other molecular phenotypes. Although these sources of variation have been useful for understanding the regulation of specific data types at a single genomic site, it is often unclear how these data types and regions are related and how they interact in regulatory networks. In this study, we propose a method to identify global patterns of histone modifications across multiple marks and individuals that reoccur in many regions of the genome. We learn a multivariate hidden Markov model where all histone marks in all individuals are used as features by applying a “stacked” version of the ChromHMM framework. We applied this framework to a dataset of 75 individuals with 3 marks (H3K27ac, H3Kme1, H3K4me3) in the lymphoblastoid cell line (LCL) and a dataset of 93 individuals comprising of autism cases and controls with 2 marks (H3K27ac, H3K4me3). We show how these global patterns of epigenetic variation across individuals can be used for three applications 1) to improve power in local histone quantitative trait loci studies 2) to identify putative trans-regulators and 3) to identify regions of the genome that are enriched for complex disease risk.

LncLOOM: a graph-based framework for mining combinations of short conserved elements in rapidly evolving sequences
COSI: RegSys COSI
  • Caroline Jane Ross, Weizmann Institute of Science, Israel
  • Amit Spinard, Weizmann Institute of Science, Israel
  • Dikla Gelbard, Weizmann Institute of Science, Israel
  • Neta Degani, Weizmann Institute of Science, Israel
  • Igor Ulitsky, Weizmann Institute of Science, Israel

Short Abstract: Thousands of long noncoding RNA (lncRNA) genes have been identified in animal genomes, a growing fraction of which have been implicated in the regulation of numerous cellular processes. This functionality has often been attributed to short conserved stretches, disguised in sequences that change rapidly, that facilitate interactions with other RNAs, proteins or genomic loci. Consequent to their rapid evolution, it is often impossible to detect significant sequence similarity in lncRNAs from species separated by >50 million years. This substantially hinders the use of comparative genomics to identify functional lncRNAs and to uncover conserved sequence elements that modulate biological function. Here we present lncLOOM, a novel graph-based framework that uses integer linear programming to identify combinations of short conserved motifs that are constrained within a set of rapidly evolved sequences. The significance of the motifs is empirically determined and the motifs are mapped to known binding sites of miRNAs and RNA binding proteins. We show that lncLOOM is a powerful approach that can efficiently uncover specific biologically relevant motifs in lncRNAs that are conserved between mammalian and fish species and motifs conserved between vertebrate and invertebrate 3’UTR sequences, even in regions where no sequence similarity is detectable by traditional alignment programs.

MEDEA: Analysis of Transcription Factor Binding Motifs in Accessible Chromatin
COSI: RegSys COSI
  • Luca Mariani, harvard medical school, United States
  • Kathryn Weinand, harvard medical school, United States
  • Stephen Gisselbrecht, harvard medical school, United States
  • Martha Bulyk, harvard medical school, United States

Short Abstract: Deciphering the interplay between chromatin accessibility and transcription factor binding is fundamental to understanding transcriptional regulation, control of cellular states, and the establishment of new phenotypes. Recent genome-wide chromatin accessibility profiling studies have provided catalogues of putative open regions, where transcription factors can recognize their motifs and regulate gene expression programs. Here, we present MEDEA (Motif Enrichment in Differential Elements of Accessibility), a computational tool that analyzes high-throughput chromatin accessibility genomic data to identify cell-type-specific accessible regions and lineage-specific motifs associated with transcription factor (TF) binding therein. To benchmark MEDEA, we used a panel of reference cell lines profiled by ENCODE and curated by the ENCODE-DREAM consortium. Comparing results with RNA-seq data, ChIP-seq peaks, and DNase-seq footprints, we show that MEDEA improves the detection of motifs associated with known lineage specifiers. We then applied MEDEA to 610 ENCODE DNase-seq datasets, where it revealed significant motifs even when absolute enrichment was low and identified novel regulators, such as NRF1 in kidney development. Finally, we demonstrate that MEDEA performs well on both bulk and single-cell ATAC-seq data. MEDEA is publicly available as part of our Glossary-GENRE suite for motif enrichment analysis.

MethMotif 2021: An update of the transcription factor binding motifs database that integrates tissue-specific features and DNA methylation profiles
COSI: RegSys COSI
  • Matthew Dyer, Faculty of Medicine, Memorial University of Newfoundland, Canada
  • Quy Lin, Cancer Science Institute of Singapore, National University of Singapore, Singapore
  • Touati Benoukraf, Faculty of Medicine, Memorial University of Newfoundland, Canada

Short Abstract: We have launched MethMotif (Lin et al. Nucleic Acids Res. 2019 Jan 8; 47: D145–D154), an integrative cell-specific database of transcription factor (TF) binding motifs coupled with DNA methylation profiles. In parallel, we developed TFregulomeR, an R-library that combined an up-to-date compendium of cistrome and methylome datasets with functions to manipulate this enormous data flow (Lin et al. Nucleic Acids Res. gkz1088, DOI: 10.1093/nar/gkz1088). These resources provide functionalities that facilitate integrative analyses that enable the characterization of TF binding partners and cell-specific TF binding sites (TFBS). Using TFregulomeR, we expanded the range of information available in the new release of MethMotif by adding a breakdown of context-specific TFs’ cofactors and their corresponding gene ontology. We have shown that TF’s target ontologies can differ notably depending on their partners. In this new release, we introduced Forked-Position Weight Matrices and Forked-Sequence Logos to better portray TF dimers. These new models better depict TFBS of a TF of interest connected to its segregated list of partners and improves PWM models of dimerized TFs, to enhance TFBS prediction power. Overall, this update turns MethMotif into a more integrative TFBS database with a diverse set of regulatory element analysis tools accessible to a broad audience.

miRDriver: A Tool to Infer Copy Number Derived Gene miRNA Networks in Cancer
COSI: RegSys COSI
  • Banabithi Bose, Marquette University, United States
  • Serdar Bozdag, Marquette University, United States

Short Abstract: miRNAs are short non-coding RNAs that have important roles in physiology and diseases such as cancer. In many studies, chromosomal copy number aberration (CNA) regions were found to host cancer driver genes and miRNAs. Inferring miRNA-gene interactions using entire genomic region has been proven to be insightful in cancer studies. However, inferring cancer-associated miRNA-gene interactions in CNA regions has not been studied well. In this study, we developed a computational tool named miRDriver that infers copy number-derived miRNA-gene interaction network utilizing multi-omics datasets such as, copy number aberration, DNA methylation, gene and miRNA expression along with transcription factors. Utilizing breast cancer and ovarian cancer data from the Cancer Genome Atlas (TCGA) database, miRDriver inferred miRNA regulators for differentially expressed genes in CNA regions via multivariate LASSO regression. miRDriver discovered several oncogenic miRNAs, known and putative miRNA-gene interactions. Inferred miRNAs were found to be significant prognostic factors within CNA regions. We compared miRDriver with other existing gene regulatory network inference methods and observed that miRDriver outperformed all these methods.

Mustache: Multi-scale Detection of Chromatin Loops from Hi-C and Micro-C Maps using Scale-Space Representation
COSI: RegSys COSI
  • Abbas Roayaei Ardakany, la jolla institute for allergy and immunology, United States
  • Halil Tuvan Gezer, Sabanci University, Turkey
  • Stefano Lonardi, University of California Riverside, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States

Short Abstract: We present Mustache, a new method for multi-scale detection of chromatin loops from Hi-C and Micro-C contact maps using a technical advance in computer vision called scale-space theory. When applied to high-resolution Hi-C and Micro-C data, Mustache detects loops at a wide range of genomic distances, identifying structural and regulatory interactions that are supported by independent conformation capture experiments as well as by known correlates of loop formation such as CTCF binding, enhancers and promoters. Unlike the commonly used HiCCUPS tool, Mustache runs on general-purpose CPUs and it is very time efficient with a runtime of only a few minutes per chromosome for 5kb-resolution human genome contact maps. Extensive experimental results show that Mustache reports two to three times the number of HiCCUPS loops, which are reproducible across replicates. It also recovers a larger proportion of published ChIA-PET and HiChIP loops than HiCCUPS. A comparative analysis of Mustache’s experimental results on Hi-C and Micro-C data confirms strong agreement between the two datasets with Micro-C providing better power for loop detection. Overall, our experimental results show that Mustache enables a more efficient and comprehensive analysis of the chromatin looping from high-resolution Hi-C and Micro-C datasets. Mustache is freely available at github.com/ay-lab/mustache.

Nested Tree Cell State Model Characterize Regional and Evolutionary Changes in Neural Cell Types
COSI: RegSys COSI
  • Michael Kleyman, Carnegie Mellon University, United States
  • Jing He, University of Pittsburgh, United States
  • Bilge Esin Ozturk, University of Pittsburgh, United States
  • Cathy Su, Carnegie Mellon University, United States
  • Molly Johnson, University of Pittsburgh, United States
  • Leah Byrne, University of Pittsburgh, United States
  • William Stauffer, University of Pittsburgh, United States
  • Andreas Pfenning, Carnegie Mellon University, United States

Short Abstract: Recent advances in single cell genomics have provided new insights into the transcriptional state of neural cell types. Despite these advances, we still know little about how these cell type-specific gene expression patterns evolve across closely and distantly related species or how they vary across different brain regions. To trace the evolutionary and spatial patterns of gene expression levels across cell types, we have developed a novel machine learning method that can learn hierarchical changes of gene expression across both species and brain regions based on single cell RNA-seq data. Our method leverages the concepts of a species or tissue hierarchy with the evolutionary theory of maximum parsimony to extract meaningful gene expression changes across multiple single cell datasets. We applied our method to analyze the region-specific patterns and evolutionary histories of neuronal cell types. Our method was able recapitulate known cell types and their associated marker genes, characterize the heterogeneity of each cell type that corresponded to biological pathways, and identify the stages of species, region, or cell type hierarchies at which reproducible gene expression changes occur. It also allows us to make inferences about gene interactions in neural cell types based on gene expression coevolution or regional co-expression.

Normalisr: inferring single-cell differential and co-expression with linear association testing
COSI: RegSys COSI
  • Lingfei Wang, Broad Institute of MIT and Harvard, United States
  • Jacques Deguine, Broad Institute of MIT and Harvard, United States
  • Ramnik Xavier, Broad Institute of MIT and Harvard, United States

Short Abstract: Single-cell RNA sequencing (ScRNA-seq) may provide unprecedented technical and statistical power to study gene expression and regulation within and across cell-types. However, due to its sparsity and technical variations, developing a superior single-cell computational method for differential expression (DE) and co-expression remains challenging. Here we present Normalisr, a parameter-free normalization-association two-step inferential framework for scRNA-seq that solves case-control DE, co-expression, and pooled CRISPRi scRNA-seq screen under one umbrella of linear association testing. Normalisr addresses those challenges with posterior mRNA abundances, nonlinear cellular summary covariates, and mean and variance normalization. All these enable linear association testing to achieve optimal sensitivity, specificity, and speed in all above scenarios. Normalisr recovers high-quality transcriptome-wide co-expression networks from conventional scRNA-seq and robust gene regulations from pooled CRISPRi scRNA-seq screens. Normalisr provides a unified framework for optimal, scalable hypothesis testings in scRNA-seq.

On the prediction of DNA-binding preferences of C2H2-ZF domains using structural models: application on human CTCF.
COSI: RegSys COSI
  • Alberto Meseguer, Universitat Pompeu Fabra, Spain
  • Filip Årman, Upf, Spain
  • Oriol Fornés Crespo, The University of British Columbia, Canada
  • Ruben Molina, Universitat Pompeu Fabra, Spain
  • Jaume Bonet, Universitat Pompeu Fabra, Spain
  • Narcis Fernandez-Fuentes, Aberystwyth University, United Kingdom
  • Baldo Oliva, upf, Spain

Short Abstract: Cis2-His2 zinc finger (C2H2-ZF) proteins are the largest family of transcription factors in human and higher metazoans. However, the DNA-binding preferences of many
members of this family remain unknown. We have developed a computational method
to predict these DNA-binding preferences. We combine information from crystal
structures composed by C2H2-ZF domains and from bacterial one-hybrid experiments
to compute scores for protein-DNA binding based on statistical potentials. We apply
the scores to compute theoretical position weight matrices (PWMs) of proteins with a
DNA-binding domain composed by C2H2-ZF domains, with the only requirement of an
input structure (experimentally determined or modelled). We have tested the capacity
to predict PWMs of zinc finger domains, successfully predicting 3-2 nucleotides of a
trinucleotide binding site for about 70% variants of single zinc-finger domains of
Zif268. We have also tested the capacity to predict the PWMs of proteins composed
by three C2H2-ZF domains, successfully matching between 60% and 90% of the
binding-site motif according to the JASPAR database. As an example, we have tested the approach to predict the DNA-binding preferences of the human chromatin binding
factor CTCF.

PhePrint – Identification of Functionally-related Genes from Large-scale Mining of Genetic Associations
COSI: RegSys COSI
  • Biming Wu, Regeneron Genetics Center, United States
  • Shareef Khalid, Regeneron Genetics Center, United States
  • Leland Barnard, Regeneron Genetics Center, United States
  • Chuan Gao, Regeneron Genetics Center, United States
  • Anthony Marcketta, Regeneron Genetics Center, United States
  • Jeffrey Reid, Regeneron Genetics Center, United States
  • Suganthi Balasubramanian, Regeneron Genetics Center, United States

Short Abstract: Large-scale sequencing of individuals with phenotype-rich electronic health records (EHRs) provides an unprecedented opportunity to understand genetic variants and their effect on phenotypes. Conventional approaches, such as GWAS and ExWAS, identify statistically significant associations that link genetic variants to the phenotype under study. Such associations often inspire hypotheses and investigations that aim to explain the physiological role of the corresponding genes. In contrast to single-trait associations, a pattern of associations to many phenotypes from multiple independent variants within the same gene may shed additional light on its biological role. Agnostic evaluation of such association signatures can connect lesser understood genes to well-studied ones and reveal novel functional relationships. Therefore, we developed an analytical method called PhePrint for identifying pairs of genes with similar phenome-wide association patterns. Briefly, PhePrint applies weighted (PCA) to a gene-phenotype score matrix that estimates the relevance of a gene to a phenotype based on summary statistics of ExWAS. Using association results generated from the exome sequences of 150,000 individuals of European ancestry from UK Biobank and their EHRs containing 4,276 phenotypes, we performed a pilot analysis focusing on ACAN, PCSK9, and LRP5 and agnostically predicted multiple known pathway members as well as other biologically relevant genes.

Predicting 3D genome folding from DNA sequence
COSI: RegSys COSI
  • Geoff Fudenberg, Gladstone Institutes, UCSF, United States
  • Katherine Pollard, Gladstone Institutes, UCSF, United States
  • David Kelley, Calico Life Sciences, LLC, United States

Short Abstract: In interphase, the human genome sequence folds in three dimensions into a rich variety of locus-specific contact patterns. Recent research has advanced our understanding of the proteins and sequences driving 3D genome folding, including the interplay between CTCF and cohesin and their roles in development and disease. Still, predicting the consequences of perturbing any individual CTCF site, or other regulatory element, on local genome folding remains a challenge. While disruptions of single bases can alter genome folding, in other cases genome folding is surprisingly resilient to large-scale deletions and structural variants. Convolutional neural networks (CNNs) have emerged as powerful tools for modeling genomic data as a function of DNA sequence, directly learning DNA sequence features from the data. CNNs now make state-of-the-art predictions for transcription factor binding, DNA accessibility, and transcription. Here we present Akita, a CNN that accurately predicts genome folding from DNA sequence alone. Representations learned by Akita underscore the importance of CTCF and reveal a complex grammar underlying genome folding. Akita enables rapid in silico predictions for sequence mutagenesis, genome folding across species, and genetic variants. Trained models, open-source code and documentation for Akita available at: github.com/calico/basenji/tree/master/ manuscripts/akita.

Predicting the effects of regulatory variants by deep learning
COSI: RegSys COSI
  • Sebastian Röner, Berlin Institute of Health (BIH), Germany
  • Max Schubach, Berlin Institute of Health (BIH), Germany
  • Louisa-Marie Krützfeldt, Berlin Institute of Health (BIH), Germany
  • Martin Kircher, Berlin Institute of Health (BIH), Germany

Short Abstract: Machine learning methods are routinely applied in medicine to prioritize and implicate genetic variants. However, pathogenic regulatory variants still correspond to a minority of known variants and quantitative readouts are not commonly available. Further, large reporter assay datasets are limited to specific loci or readouts of standing variation. While inherently biased and unsuited for machine learning, these datasets provide important validation and show that although myriad annotations and scores exist, none predicts experimental regulatory variant effects consistently.
Here, we develop a deep neural network (DNN), trained on active and non-active regulatory regions from multiple cell-types. We selected open chromatin peaks from publicly available DNase data and use a multi-task convolutional network based only on genomic sequence. Our model outperforms others' in predicting experimental variant effects. On saturation mutagenesis data of six regulatory elements, we see an improvement of 30% compared to the best competitor (Expecto). Using recent methods of DNN interpretation, we extract critical regulatory mechanisms from our model, identifying motifs of key players in cell-type specific transcriptional regulation. We conclude that DNNs have a high potential in predicting regulatory effects and tissue-specific regulation, a critical step forward in the genome-wide interpretation of variants in common and rare disease.

Prediction and association of alternative splicing events in Breast Cancer subtypes based on meta-analysis approaches
COSI: RegSys COSI
  • Nathan De Oliveira Nunes, Institute of Mathematics and Statistics, University of Sao paulo, Brazil
  • Dharshna Priya Ramasany, Department of Surgery, Anatomy Sector, Faculty of Veterinary Medicine and Animal Science; University of São Paulo, Brazil
  • Ana Claudia Oliveira Carreira, Department of Surgery, Anatomy Sector, Faculty of Veterinary Medicine and Animal Science; University of São Paulo, Brazil
  • Milton Y Nishiyama Jr, Butantan Institute, Brazil

Short Abstract: Breast cancer is the most diagnosed type of cancer among women worldwide, being triple-negative breast cancer (TNBC) the high-risk subtype. The alternative splicing (AS) events have been reported as signatures for prognosis and tumor progression, however their relevance to tumorigenesis still remains unknown. Normal and tumor RNA-seq data from The Cancer Genome Atlas (TCGA) dataset were downloaded and manually curated. By integrating bioinformatics analysis were conducted subtypes classification, identification of new AS events and respective expression profiles estimation. Appropriate criteria for samples normalization (TMM+COMBAT) and inclusion/exclusion allowed to eliminate batch effects and outliers. There was established a multifactorial meta-analysis approach using RankProduct method to identify tumor related molecular markers and signaling pathways. The AS events in TNBC samples were inspected after the alignment and prediction of isoforms by StringTie and Cufflinks. The isoformSwitchAnalyzerR approach identified around 350 genes in each AS event type such as Intron retention, Exon skipping and Alternative start/termination sites. There were identified 37 common genes in both approaches with FDR < 0.05, highlighting IFNAR1 and IFNAR2 genes with higher AS events in tumor samples corroborating with the literature, and even known canonical TNBC genes such as PIK3CA.
Acknowledgements:CNPq,CAPES,FAPESP

Quantitative comparison of within-sample heterogeneity scores for DNA methylation data
COSI: RegSys COSI
  • Michael Scherer, Saarland University, Germany
  • Almut Nebel, Institute of Clinical Molecular Biology, Germany
  • Andre Franke, Institute of Clinical Molecular Biology, Germany
  • Joern Walter, Saarland University, Germany
  • Thomas Lengauer, Max Planck Institute for Informatics, Germany
  • Christoph Bock, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Austria
  • Fabian Müller, Stanford University, United States
  • Markus List, Technical University of Munich,, Germany

Short Abstract: Background: DNA methylation is an epigenetic mark with important regulatory roles in cellular identity and can be quantified at base resolution using bisulfite sequencing. Most studies are limited to the average DNA methylation levels of individual CpGs and thus neglect heterogeneity within the profiled cell populations. To assess this within-sample heterogeneity (WSH) several window-based scores that quantify variability in DNA methylation in bisulfite sequencing reads have been proposed.
Results: We performed the first systematic comparison of four published WSH scores based on simulated and publicly available datasets. Moreover, we propose two new scores and provide guidelines for selecting appropriate scores to address cell-type heterogeneity, cellular contamination and allele-specific methylation. Most of the measures were sensitive in detecting DNA methylation heterogeneity in these scenarios, while we detected differences in susceptibility to technical bias. Using recently published DNA methylation profiles of Ewing sarcoma samples, we show that DNA methylation heterogeneity provides information complementary to the DNA methylation level.
Conclusions: WSH scores are powerful tools for estimating variance in DNA methylation patterns and have the potential for detecting novel disease-associated genomic loci not captured by established statistics. We provide an R-package (github.com/MPIIComputationalEpigenetics/WSHPackage) implementing the WSH scores for integration into analysis workflows.

Reconstructing the Gene Regulatory Landscape of Pediatric Brain Tumors
COSI: RegSys COSI
  • Alyaa Mahmoud, University of Calgary, Canada
  • Sorana Morrissy, University of Calgary, Canada

Short Abstract: Different oncogenic hits require distinct transcriptional and developmental backgrounds to induce and maintain malignancies. In this study, we aim to reconstruct gene regulatory networks that lead to divergence from normal brain cellular development to a tumorigenic fate.

Using methods that exploit TFs and cis-regulatory sequences to infer context-specific GRNs, we obtained regulons that characterise cell states across a cohort of pediatric cancer patients (n=200) from the CBTTC consortium. This analysis provided 800 regulons, that presented significant enrichment of the motifs of the corresponding transcription factors. These were then compared to regulons characterizing multiple normal brain regions ranging in age from embryonic development through adulthood. A smaller set of regulons (n=615) characterised the normal cohort with 566 regulons overlapping between the two cohorts. Nevertheless, targets of transcriptions factors in the overlapping regulons were context-specific. One such example is the regulon of the proto-oncogene FLI1 which had smaller number of targets in the normal (n=2235) relative to the tumor cohort (n=1980) with 562 overlapping targets reflecting the involvement of FLI1 in angiogenesis and vascularisation. Tumor-private targets (n=1418) were enriched in immune response terms while normal-private targets (n=573) were enriched in cellular developmental processes.

Regulatory mechanism of low-grade inflammatory monocytes at the single cell level
COSI: RegSys COSI
  • Jiyoung Lee, Virginia Tech, United States
  • Shuo Geng, Virginia Tech, United States
  • Liwu Li, Virginia Tech, United States
  • Song Li, Virginia Tech, United States

Short Abstract: Monocyte is a key innate immune cell type modulating diverse host inflammatory responses. Subclinical doses of LPS (SD-LPS) are known to causes low-grade inflammation in monocytes, which could lead to inflammatory diseases. Herein, we aim to understand the gene regulatory networks of monocyte under low-grade inflammation. We generated single-cell RNA-seq data from mouse monocytes treated with PBS, SD-LPS, 4-PBA, and SD-LPS + 4-PBA. We found 11 clusters in the single-cell RNA-seq data from four conditions. We observed that 11 clusters consist of both homogenous and heterogeneous subpopulations in response to treatments. Monocytes treated with SD-LPS were separated into two clusters with distinctive gene expression patterns from known marker genes (Ccl2, Ccr5, and Cd40) of primed state. To define transitional states in monocytes, we inferred the trajectory on the clusters, and characterized genes that changed expression levels on the trajectory. Using published ATAC-seq data, we identified 54 transcription factors as candidate regulators including SP2, IRF3, and STAT1. Single-cell transcriptome analysis allows us to dissect gene regulation in mice monocytes. Based on enriched motifs, we prioritize key transcription factors and their target genes that may play important roles in low-grade inflammatory polarization of monocytes.

Solo: doublet identification via semi-supervised deep learning
COSI: RegSys COSI
  • Nicholas Bernstein, Calico Life Sciences, United States
  • Nicole Fong, Calico Life Sciences, United States
  • Irene Lam, Calico Life Sciences, United States
  • Margaret Roy, Calico Life Sciences, United States
  • David Hendrickson, Calico Life Sciences, United States
  • David Kelley, Calico Life Sciences, LLC, United States

Short Abstract: Single cell RNA-seq (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state. However, current methods often result in two or more cells that share the same cell-identifying barcode; these “doublets” violate the fundamental premise of single cell technology and can lead to incorrect inferences. Here, we describe Solo, a semi-supervised deep learning approach for detecting such doublets. Solo operates in the framework suggested by previous work, in which we simulate doublet cells from the observed data and train a classifier to distinguish them. We leverage the count-based variational autoencoder scVI to embed cells in a latent space, before appending an additional neural network classifier to the scVI encoder to predict doublets. Altogether, this method identifies doublets with greater accuracy than existing methods, particularly on datasets with more cells, more UMIs, and greater transcriptional complexity. Finally, we demonstrate that Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells beyond any previous approach.

StoHi-C: Using t-Distributed Stochastic Neighbor Embedding (t-SNE) to predict 3D genome structure from Hi-C Data
COSI: RegSys COSI
  • Kimberly Mackay, University of Saskatchewan, Canada
  • Anthony Kusalik, University of Saskatchewan, Canada

Short Abstract: Background: In order to understand the relationship between genomic structure and function, 3D genome structures must be predicted from biological data (like Hi-C) using computational tools. Many of these existing tools rely partially or completely on multi-dimensional scaling (MDS). MDS is known to have inherent problems when applied to high-dimensional datasets like Hi-C. Alternatively, t-Distributed Stochastic Neighbor Embedding (t-SNE) is able to overcome these problems but has not been used to predict 3D genome structures.

Objective: Develop a tool that uses t-SNE to predict 3D genomic structures from Hi-C data.

Methodology: A tool called StoHi-C (pronounced "stoic") was developed to fulfill this objective. StoHi-C is a two-step workflow that involves 3D embedding with t-SNE followed by 3D visualization. This workflow was applied to multiple fission yeast Hi-C datasets. These results were then compared to 3D genomic predictions for the same datasets generated by an MDS-based method.

Conclusion: 3D predictions from StoHi-C clearly depicted the RabI chromosome configuration, a known hallmark of fission yeast genomic organization. This configuration was absent in the MDS-based predictions. Overall, StoHi-C was able to generate 3D genome structures that more clearly exhibit the established principles of fission yeast 3D genomic organization when compared to MDS-based predictions.

The cross-tissue landscape of metabolic receptors inferred from gene expression data
COSI: RegSys COSI
  • Judith Somekh, University of Haifa, Israel

Short Abstract: The human system responds to different conditions to achieve a state of homeostasis using ‘inter-organ’ communication. This communication is achieved through the secretion of ligands into the blood stream from source organs, followed by binding of the ligands to their receptors that are located on target organs. This complex communication network and the roles receptors play in tissues is only partially understood. We present a methodology to predict the tissue-specific metabolic roles of receptors. We analysed ~700 known receptors across 25 human tissues, using RNA-Seq gene expression data from the GTEX project. We detected coordination patterns of receptor expression across tissues. We used these patterns and enrichment analysis of co-expression networks to infer metabolic roles of receptor in various tissues. Using our methodology, we predict new housekeeping metabolic roles for receptors at the whole-body level. We validate the strongly predicted housekeeping metabolic receptors to be significantly differentially expressed between metabolically related cases and controls in several datasets. Our predictions can serve as the basis for the development of new receptor-targeting drugs and to understand drugs side effects.

The role of matrix metalloproteinases in the pathology of radiation-induced oral mucositis
COSI: RegSys COSI
  • John Dillon, University of Toledo, Department of Biological Sciences, United States
  • Jessica Saul-McBeth, University of Toledo, Department of Biological Sciences, United States
  • Jacqueline Kratch, University of Toledo, Department of Biological Sciences, United States
  • Heather Conti, University of Toledo, Department of Biological Sciences, United States

Short Abstract: Oral Mucositis (OM) is a deleterious side effect of radiotherapy targeting the head and neck. The severe ulcers that form can result in increased hospitalizations and cessation of cancer treatment. Effective therapies without side effects are lacking for OM. In order to develop more successful treatments, a better understanding of the pathophysiology of OM is necessary. Following damage, matrix metalloproteinases (MMPs) are essential for tissue remodeling and leukocyte trafficking. However, if expressed in excess, MMPs can cause disproportionate inflammation and impede healing. Because the roles of MMPs during OM are not completely understood, we assessed expression of MMPs during peak damage. RNA-seq was performed comparing sham to irradiated mice and a total 1892 genes were differentially expressed, with 407 genes significantly upregulated and 271 genes significantly downregulated in the irradiated tongue tissue. Of note, MMP10, 1a/b,8,13,12, and 27 showed a ≥ 2 Log2 fold change and P-value ≤ 0.05. Expression of these genes was verified by qPCR. In all, radiation induces transcription of MMPs that may contribute to the pathology of OM. Future studies will investigate the kinetics of MMP gene expression to gain a better understanding of each during the course of OM in order to inform drug development.

Thermodynamics-based modeling of enhancer RNA transcription in breast cancer cells
COSI: RegSys COSI
  • Shayan Tabe Bordbar, UIUC, United States
  • Saurabh Sinha, University of Illinois at Urbana-Champaign, United States

Short Abstract: Estrogen Receptor a (ERa) is a major lineage determining Transcription Factor (TF) in mammary gland development. Dysregulation of ERa-mediated transcriptional program results in cancer. In order to characterize the cis-regulatory code underlying this complex transcriptional program, we analyzed enhancer RNAs (eRNAs) known to be regulated during estrogen response in breast cancer cells. eRNAs are short bidirectional RNAs transcribed from non-protein coding regions of the genome and are known to be a reliable marker of active enhancers. In this study, through sequence analysis, we aim to understand the regulatory program that controls eRNA transcription in breast cancer cells. First, using standard classification techniques, we identified several sequence features that distinguish eRNAs transcribed upon estrogen treatment. Next, we incorporate these features in a thermodynamics-based model to gain insights into the role of several TFs in ERa-mediated transcriptional program. The main TFs identified by the model as driving this transcriptional program are in line with previous reports. Additionally, we identified putative TF-enhancer regulatory connections and Integrated these findings with experimentally determined enhancer-promoter interactions to construct an eRNA-informed Gene Regulatory Network (GRN) that can be exploited to provide mechanistic insights on sequence-mediated aberrations that trigger the malignant transformation of normal cells.

Transcriptional network dynamics during the progression of pluripotency revealed by integrative statistical learning
COSI: RegSys COSI
  • Hani Jieun Kim, The University of Sydney, Australia

Short Abstract: The developmental potential of cells, termed pluripotency, is highly dynamic and progresses through a continuum of naive, formative and primed states. Pluripotency progression of mouse embryonic stem cells (ESCs) from naive to formative and primed state is governed by transcription factors (TFs) and their target genes. Genomic techniques have uncovered a multitude of TF binding sites in ESCs, yet a major challenge lies in identifying target genes from functional binding sites and reconstructing dynamic transcriptional networks underlying pluripotency progression. Here, we integrated time-resolved ‘trans-omic’ datasets together with TF binding profiles and chromatin conformation data to identify target genes of a panel of TFs. Our analyses revealed that naive TF target genes are more likely to be TFs themselves than those of formative TFs, suggesting denser hierarchies among naive TFs. We also discovered that formative TF target genes are marked by permissive epigenomic signatures in the naive state, indicating that they are poised for expression prior to the initiation of pluripotency transition to the formative state. Finally, our reconstructed transcriptional networks pinpointed the precise timing from naive to formative pluripotency progression and enabled the spatiotemporal mapping of differentiating ESCs to their in vivo counterparts in developing embryos.

Widespread effects of DNA methylation and intra-motif dependencies revealed by novel transcription factor binding models
COSI: RegSys COSI
  • Jan Grau, Martin-Luther-Universität Halle- Wittenberg, Germany
  • Florian Schmidt, A-Star Institute Singapore, Singapore
  • Marcel Schulz, Goethe University Frankfurt, Germany

Short Abstract: Accurate models describing the binding specificity of Transcription factors (TFs) are essential for a better understanding of transcriptional regulation. Aside from chromatin accessibility and sequence specificity, several studies suggested that DNA methylation impairs or enhances TF binding. However, currently available TF motif inference and TF binding site (TFBS) prediction approaches do not adequately incorporate DNA methylation.
Here, MeDeMo (Methylation and Dependencies in Motifs), is presented as a novel framework for TF motif discovery and TFBS prediction that incorporates DNA methylation by extending LSlim models.
We conducted a large scale study of DNA methylation sensitivity for 144 TFs using MeDeMo on over 600 ENCODE ChIP-seq datasets with matched bisulfite-seq data on four cell types.
For many TFs using DNA methylation information leads to better discrimination between TF-bound and unbound sequences. In addition, we show that dependencies between nucleotides, captured by MeDeMo, are essential to represent DNA methylation.
Overall, we find that DNA methylation has a detrimental influence on TF binding and was found to differ for TFs in the same family.
MeDeMo for motif discovery is available as i) a stand-alone binary versions offering a graphical and a command line interface www.jstacs.de/index.php/MeDeMo, and ii) at github.com/Jstacs/Jstacs.